On the String Translations Produced by Multi Bottom-Up Tree Transducers

نویسنده

  • Daniel Gildea
چکیده

Many current approaches to syntax-based statistical machine translation fall under the theoretical framework of synchronous tree substitution grammars (STSGs). Tree substitution grammars (TSGs) generalize context-free grammars (CFGs) in that each rule expands a nonterminal to produce an arbitrarily large tree fragment, rather than a fragment of depth one as in a CFG. Synchronous TSGs generate tree fragments in the source and target languages in parallel, with each rule producing a tree fragment in either language. Systems such as that of Galley et al. (2006) extract STSG rules from parallel bilingual text that has been automatically parsed in one language, and the STSG nonterminals correspond to nonterminals in these parse trees. Chiang’s (2007) Hiero system produces simpler STSGs with a single nonterminal. STSGs have the advantage that they can naturally express many re-ordering and restructuring operations necessary for machine translation (MT). They have the disadvantage, however, that they are not closed under composition (Maletti et al. 2009). Therefore, if one wishes to construct anMT system as a pipeline of STSG operations, the result may not be expressible as an STSG. Recently, Maletti (2010) has argued that multi bottom–up tree transducers (MBOTs) (Lilin 1981; Arnold and Dauchet 1982; Engelfriet, Lilin, and Maletti 2009) provide a useful representation for natural language processing applications because they generalize STSGs, but have the added advantage of being closed under composition. MBOTs generalize traditional bottom–up tree transducers in that they allow transducer states to pass more than one output subtree up to subsequent transducer operations. The number of subtrees taken by a state is called its rank. MBOTs are linear and non-deleting; that is, operations cannot copy or delete arbitrarily large tree fragments. Although STSGs and MBOTs both perform operations on trees, it is important to note that, in MT, we are primarily interested in translational relations between strings. Tree operations such as those provided by STSGs are ultimately tools to translate a string

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

String-to-Tree Multi Bottom-up Tree Transducers

We achieve significant improvements in several syntax-based machine translation experiments using a string-to-tree variant of multi bottom-up tree transducers. Our new parameterized rule extraction algorithm extracts string-to-tree rules that can be discontiguous and non-minimal in contrast to existing algorithms for the tree-to-tree setting. The obtained models significantly outperform the str...

متن کامل

The Use of Tree Transducers to ComputeTranslations Between Graph

The power of top-down, bottom-up, and tree-to-graph-to-tree transducers (tgt transducers) to compute translations from hyperedge-replacement algebras into edge-replacement algebras is investigated. Compositions of top-down and bottom-up tree transducers are too weak if the operations in the target algebra are powerful enough to deene all series-parallel graphs, 2-trees, or related types of grap...

متن کامل

Extended Multi Bottom-Up Tree Transducers Composition and Decomposition

Extended multi bottom-up tree transducers are de ned and investigated. They are an extension of multi bottom-up tree transducers by arbitrary, not just shallow, left-hand sides of rules; this includes rules that do not consume input. It is shown that such transducers, even linear ones, can compute all transformations that are computed by linear extended top-down tree transducers, which are a th...

متن کامل

Exact Decoding with Multi Bottom-Up Tree Transducers

We present an experimental statistical tree-to-tree machine translation system based on the multi-bottom up tree transducer including rule extraction, tuning and decoding. Thanks to input parse forests and a “no pruning” strategy during decoding, the obtained translations are competitive. The drawbacks are a restricted coverage of 70% on test data, in part due to exact input parse tree matching...

متن کامل

Composition and Decomposition of Extended Multi Bottom-Up Tree Transducers?

Extended multi bottom-up tree transducers are de ned and investigated. They are an extension of multi bottom-up tree transducers by arbitrary, not just shallow, left-hand sides of rules; this includes rules that do not consume input. It is shown that such transducers can compute all transformations that are computed by linear extended top-down tree transducers (which are a theoretical model for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Linguistics

دوره 38  شماره 

صفحات  -

تاریخ انتشار 2012